How to Deal with Heterogeneous Data?
نویسنده
چکیده
In the context of Web 2.0, text content is often heterogenous (i.e. lexical heterogeneity). For instance, some words may be shortened or lengthened with the use of specific graphics (e.g. emoticons) or hashtags. Specific processing is necessary in this context. For instance, with an opinion classification task based on the message SimBig is an aaaaattractive conference!, the results are generally improved by removing repeated characters (i.e. a). But information on the sentiment intensity identified by the character elongation is lost with this normalization. This example highlights the difficulty of dealing with heterogeneous textual data content. The following sub-section describes the heterogeneity according the document types (e.g. images and texts).
منابع مشابه
Awareness of Radiological Accidents and How to Deal with It: A Study of Nurses and Nursing Faculties of Isfahan University of Medical Sciences
Introduction: Along with peaceful uses of ionizing radiation, its destructive applications have always threatened human life. One of the most important actions in calamities and disasters especially radiological and nuclear catastrophes is immediate assistance and medical care for victims. Therefore, knowing how to cope with nuclear and radiological disasters has become a part of modern care ed...
متن کاملhow to deal Baqie In Islamic Jurisprudence (Compliance with Islamic Penal Code)
Islamic regulations are the source of rebellious criminality that in Islamic Penal Code 1392 Separated from Moharebeh and became an independent criminal. Given that rebellious is a Specific and designated crime and punishable by death, the expectation, taking into account the principles of Islamic law, is to punish the criminals with great sensitivity. On this basis, the subject of this article...
متن کاملProposing a Robust Model of Interval Data Envelopment Analysis to Performance Measurement under Double Uncertainty Situations
It is very necessary to consider the uncertainty in the data and how to deal with it when performance measurement using data envelopment analysis. Because a little deviation in the data can lead to a significant change in the performance results. However, in the real world and in many cases, the data is uncertain. Interval data envelopment analysis is one of the most widely used approaches to d...
متن کاملA Comparison of NSGA II and MOSA for Solving Multi-depots Time-dependent Vehicle Routing Problem with Heterogeneous Fleet
Time-dependent Vehicle Routing Problem is one of the most applicable but least-studied variants of routing and scheduling problems. In this paper, a novel mathematical formulation of time-dependent vehicle routing problems with heterogeneous fleet, hard time widows and multiple depots, is proposed. To deal with the traffic congestions, we also considered that the vehicles are not forced to come...
متن کاملSemi-supervised Clustering on Heterogeneous Information Networks
Semi-supervised clustering on information networks combines both the labeled and unlabeled data sets with an aim to improve the clustering performance. However, the existing semi-supervised clustering methods are all designed for homogeneous networks and do not deal with heterogeneous ones. In this work, we propose a semi-supervised clustering approach to analyze heterogeneous information netwo...
متن کاملHow to deal with massively heterogeneous cultural heritage data - lessons learned in CultureSampo
This paper presents the CultureSampo system from the viewpoint of publishing heterogeneous linked data as a service. Discussed are the problems of converting legacy data into linked data, as well as the challenge of making the massively heterogeneous yet interlinked cultural heritage content interoperable on a semantic level. In the approach described, the data is published not only for human u...
متن کامل